1,798 research outputs found
Joint Vertex Degrees in an Inhomogeneous Random Graph Model
In a random graph, counts for the number of vertices with given degrees will
typically be dependent. We show via a multivariate normal and a Poisson process
approximation that, for graphs which have independent edges, with a possibly
inhomogeneous distribution, only when the degrees are large can we reasonably
approximate the joint counts as independent. The proofs are based on Stein's
method and the Stein-Chen method with a new size-biased coupling for such
inhomogeneous random graphs, and hence bounds on distributional distance are
obtained. Finally we illustrate that apparent (pseudo-) power-law type
behaviour can arise in such inhomogeneous networks despite not actually
following a power-law degree distribution.Comment: 30 pages, 9 figure
The practical use of the A* algorithm for exact multiple sequence alignment
Multiple alignment is an important problem in computational biology. It is well known that it can be solved exactly by a dynamic programming algorithm which in turn can be interpreted as a shortest path computation in a directed acyclic graph. The algorithm (or goal directed unidirectional search) is a technique that speeds up the computation of a shortest path by transforming the edge lengths without losing the optimality of the shortest path. We implemented the algorithm in a computer program similar to MSA~\cite{GupKecSch95} and FMA~\cite{ShiIma97}. We incorporated in this program new bounding strategies for both, lower and upper bounds and show that the algorithm, together with our improvements, can speed up comput ations considerably. Additionally we show that the algorithm together with a standard bounding technique is superior to the well known Carillo-Lipman bounding since it excludes more nodes from consideration
Fast and accurate read mapping with approximate seeds and multiple backtracking
We present Masai, a read mapper representing the state-of-the-art in terms of speed and accuracy. Our tool is an order of magnitude faster than RazerS 3 and mrFAST, 2-4 times faster and more accurate than Bowtie 2 and BWA. The novelties of our read mapper are filtration with approximate seeds and a method for multiple backtracking. Approximate seeds, compared with exact seeds, increase filtration specificity while preserving sensitivity. Multiple backtracking amortizes the cost of searching a large set of seeds by taking advantage of the repetitiveness of next-generation sequencing data. Combined together, these two methods significantly speed up approximate search on genomic data sets. Masai is implemented in C++ using the SeqAn library. The source code is distributed under the BSD license and binaries for Linux, Mac OS X and Windows can be freely downloaded from http://www.seqan.de/projects/masai
RazerS 3: Faster, fully sensitive read mapping
Motivation: During the last years NGS sequencing has become a key technology for many applications in the biomedical sciences. Throughput continues to increase and new protocols provide longer reads than currently available. In almost all applications, read mapping is a first step. Hence, it is crucial to have algorithms and implementations that perform fast, with high sensitivity, and are able to deal with long reads and a large absolute number of indels.
Results: RazerS is a read mapping program with adjustable sensitivity based on counting q-grams. In this work we propose the successor RazerS 3 which now supports shared-memory parallelism, an additional seed-based filter with adjustable sensitivity, a much faster, banded version of the Myers’ bit-vector algorithm for verification, memory saving measures and support for the SAM output format. This leads to a much improved performance for mapping reads, in particular long reads with many errors. We extensively compare RazerS 3 with other popular read mappers and show that its results are often superior to them in terms of sensitivity while exhibiting practical and often competetive run times. In addition, RazerS 3 works without a precomputed index.
Availability and Implementation: Source code and binaries are freely available for download at http://www.seqan.de/projects/razers. RazerS 3 is implemented in C++ and OpenMP under a GPL license using the SeqAn library and supports Linux, Mac OS X, and Windows
Segment-based multiple sequence alignment
Motivation: Many multiple sequence alignment tools have been developed in the past, progressing either in speed or alignment accuracy. Given
the importance and wide-spread use of alignment tools, progress in
both categories is a contribution to the community and has driven
research in the field so far. Results: We introduce a graph-based
extension to the consistency-based, progressive alignment strategy.
We apply the consistency notion to segments instead of single characters.
The main problem we solve in this context is to define segments of
the sequences in such a way that a graph-based alignment is possible.
We implemented the algorithm using the SeqAn library and report results
on amino acid and DNA sequences. The benefit of our approach is threefold:
(1) sequences with conserved blocks can be rapidly aligned, (2) the
implementation is conceptually easy, generic and fast and (3) the
consistency idea can be extended to align multiple genomic sequences.
Availability: The segment-based multiple sequence alignment tool
can be downloaded from http://www.seqan.de/projects/msa.html. A novel
version of T-Coffee interfaced with the tool is available from http://www.tcoffee.org.
The usage of the tool is described in both documentations. Contact:
[email protected]
RazerS - Fast Read Mapping with Sensitivity Control
Second-generation sequencing technologies deliver DNA sequence data at unprecedented high throughput. Common to most biological applications is a mapping of the reads to an almost identical or highly similar reference genome. Due to the large amounts of data, efficient algorithms and implementations are crucial for this task. We present an efficient read mapping tool called RazerS. It allows the user to align sequencing reads of arbitrary length using either the Hamming distance or the edit distance. Our tool can work either lossless or with a user-defined loss rate at higher speeds. Given the loss rate, we present an approach that guarantees not to lose more reads than specified. This enables the user to adapt to the problem at hand and provides a seamless tradeoff between sensitivity and running time
- …